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1.  Introduction 


Situation  awareness  is  the  perception  of  the  elements  in  the  environment  within  a 
volume  of  time  and  space,  the  comprehension  of  their  meaning  and  the  projection 
of  their  status  in  the  near  future. 

Endsley  (1988,  2000) 

One  of  the  limitations  of  the  current  generation  of  synthetic  forces  is  their  lack  of 
situation  awareness  and  understanding  (Pew  and  Mavor,  1998).  Situation  awareness  is 
critical  for  making  intelligent  decisions — without  it  there  is  no  context  for  adapting  one’s 
behavior  to  accommodate  the  current  and  future  states  of  the  world  (Klein,  2000;  Waag 
and  Bell,  1997).  Consider  a  helicopter  pilot  who  is  performing  a  reconnaissance  mission 
in  an  area  behind  enemy  lines.  The  pilot  must  combine  safety  from  enemy  fire  (by 
staying  close  to  the  ground  and  scanning  for  enemy  air  defenses),  and  safety  from 
colliding  with  the  ground  and  other  helicopters,  while  simultaneously  scanning  for  and 
assessing  the  activity  of  enemy  units.  To  accommodate  the  competing  needs  of  these 
tasks,  the  pilot  continuously  adjusts  the  helicopter’s  flight  path.  To  avoid  hitting  the 
ground  or  other  aircraft,  the  pilot  tracks  the  helicopter’s  location  and  trajectory,  the 
location  of  the  terrain,  and  the  trajectories  of  the  other  aircraft.  To  avoid  detection  by  the 
enemy  the  pilot  stays  in  areas  that  provide  cover  and  concealment,  but  which  also  afford 
the  ability  to  track  enemy  units.  Once  the  pilot  locates  the  enemy  forces  he  then 
maneuvers  so  that  he  can  observe  the  vehicle  formations  to  the  largest  possible  extent 
while  still  maintaining  concealment. 

All  three  aspects  of  Endsley’ s  definition  of  situation  awareness  (Endsley,  1988,2000)  are 
present  in  this  example.  First,  the  pilot  has  to  perceive  the  elements  in  the  environment — 
situation  awareness  is  not  possible  without  perception.  The  pilot  would  be  flying  blind  if 
he  or  she  did  not  acquire  periodic  updates  about  the  helicopter’s  state,  the  terrain  and  the 
people  and  vehicles  in  the  environment.  But  the  ability  to  perceive  is  limited  in  nature — 
humans  have  a  limited  field  of  view  and  can  only  attend  to  a  few  objects  at  a  time.  To 
compensate  for  this  limitation,  the  perceptual  system  must  have  a  means  of  focusing 
attention  on  some  objects  while  filtering  out  the  details  of  others.  Furthermore, 
perceiving  involves  knowing  where  to  look  and  how  long  to  look  at  an  object  before 
shifting  attention  to  another. 

Second,  merely  perceiving  the  elements  in  the  environment  is  not  sufficient — the  pilot 
must  also  comprehend  the  meaning  of  what  is  occurring  in  the  world.  The  pilot’s 
mission,  goals  and  tasks  provide  the  context  for  assigning  meaning  to  the  situation.  The 
situation  must  be  assessed  with  respect  to  the  latest  perceptual  inputs — is  the  adjacent 
helicopter  in  this  formation  on  a  safe  course?  Are  the  opposing  forces  moving  in  a 
column  formation  down  a  road?  Or  are  they  in  a  defensive  position?  What  kind  of  force 
is  it — ^primarily  tanks,  or  is  it  a  group  of  trucks  loaded  with  troops?  What  echelon  is 
being  deployed?  To  answer  these  questions  involves  integrating  what  has  been  perceived 
over  time  and  interpreting  it  with  respect  to  known  patterns  of  behavior.  If  a  pattern  is 
recognized,  it  can  be  used  to  infer  the  plans  and  goals  of  the  forces  under  observation. 


Finally,  after  a  situation  has  been  assessed  and  integrated  with  what  was  previously 
known,  the  model  can  then  be  used  to  predict  what  will  happen  in  the  future.  Predictions 
are  the  basis  for  forming  expectations  about  how  the  situation  will  evolve  (Dominguez, 
1994),  which  potentially  inform  the  perceptual  focus  of  attention  in  two  ways.  In  the 
near  term  a  prediction  allows  one  to  ignore  (i.e.,  not  observe)  certain  objects 

2.  Research  Summary 

The  research  performed  under  this  grant  from  the  Office  of  Naval  Research  (ONR) 
addressed  situation  awareness  by  building  models  in  virtual  humans.  A  virtual  human  is 
a  program  that  models  human  cognition  and  behavior  in  a  simulation  of  reality.  Virtual 
humans  are  playing  an  increasingly  important  role  in  military  battle  simulations, 
commercial  computer  and  video  games,  interactive  narrative  in  the  entertainment  domain, 
and  for  human  factors  analysis  in  engineering.  This  section  of  the  report  summarizes  our 
work  to  create  greater  situation  awareness  in  virtual  humans  in  three  areas:  perception, 
comprehension,  and  prediction. 

2.1  Perception 

Our  model  of  attention  is  based  on  psychological  principles:  (1)  Attention  controls 
perceptual  processing  -  it  is  the  system  for  controlling  the  way  information  is  routed  and 
for  controlling  processing  priorities  (Posner,  1987).  (2)  Perceptual  processing  occurs  in 
stages.  (3)  Attention  is  selective  -  it  focuses  on  objects  or  locations,  which  requires  the 
ability  to  filter  out  excess  information.  (4)  Top-down  and  bottom-up  processes  influence 
attention.  (5)  Attention  operates  like  a  zoom  lens  -  it  can  focus  on  a  small  area  with  high 
resolution  or  a  larger  area  with  lower  resolution. 

2.1.1  Perceptual  Attention 

We  developed  a  model  of  perceptual  attention  for  synthetic  helicopter  pilots  in  an  entity- 
level  battlespace  simulator  called  ModSAF*  (now  known  as  OneSAF).  The  pilots  were 
implemented  in  Soar,  a  well-known  computational  model  of  cognition  (Laird  et  al.,  1987; 
Rosenbloom  et  al.,  1991,  1993).  Early  versions  of  the  pilots  would  often  crash  their 
aircraft  when  they  encountered  situations  where  there  were  many  other  entities  in  the 
field  of  view.  The  reason  they  lost  control  was  because  they  were,  in  a  sense,  paying 
attention  to  everything  in  the  world  at  a  high  level  of  detail,  which  was  both 
computationally  expensive  and  humanly  unrealistic.  To  avoid  being  overwhelmed  by 
visual  stimuli,  what  was  needed  was  the  ability  to  selectively  focus  perceptual  attention 
on  only  the  details  that  were  necessary,  while  either  leaving  out  other  details,  or  else 
perceiving  them  at  a  more  abstract  level. 

2.1.2  Stages  of  Processing 

Perceptual  processing  is  performed  in  stages,  with  perceptual  (gestalt)  grouping  occurring 
in  the  pre-attentive  stage  along  with  some  filtering,  and  more  detailed  processing  in  the 
attentive  stage.  The  control  of  attention  is  primarily  goal-driven,  but  attention  can  be 
captured  by  stimuli  such  as  the  abrupt  onset  causes  by  explosions,  motion,  and  salient 


'  ModSAF  stands  for  Modular  Semi-Automated  Forces 


differences  in  "color".  The  zoom  lens  model  of  attention  currently  has  two  resolutions, 
low  and  high.  Perceptual  grouping  is  dynamic  and  involuntary,  although  the  pilot  can 
also  voluntarily  group  visual  objects  for  tracking  purposes.  The  perceptual  attention 
model  is  integrated  with  Soar's  decision  cycle  in  a  pilot  that  flies  missions  in  a  synthetic 
battlespace,  complete  with  terrain  and  other  entities,  which  include  tanks,  trucks,  and 
dismounted  infantry. 

2.1.3  Selective  Attention 

To  enable  selective  attention  we  added  mechanisms  to  the  pilot’s  perceptual  system  that 
provide  the  ability  to  focus  on  specific  classes  or  types  of  entities,  with  the  rest  being 
filtered  in  earlier  stages  of  the  processing,  prior  to  reaching  the  agent’s  working  memory. 
These  filtering  mechanisms  build  directly  on  the  staged  approach  to  processing.  Entities 
can  be  selected  for  attention  based  on  the  attributes  of  distance,  color,  group  membership 
(e.g.,  member  of  proximally  clustered  group  of  vehicles),  vehicle  category  (e.g.,  tank, 
helicopter,  truck)  or  by  specific  vehicle  type  (e.g.,  T-70  tank,  ZSU-23-2  air  defense,  AH- 
64  helicopter). 

2.1.4  Zoom  Lens 

The  selection  mechanism  is  what  enables  the  zoom  lens  effect  -  to  get  the  big  picture  of 
the  battlefield  the  pilot  zooms  out  so  that  what  it  perceives  arrives  in  working  memory  as 
a  set  of  groups  of  vehicles  that  were  clustered  on  the  basis  of  proximity  to  one  another. 

To  get  this  view  the  pilot  unsets  all  the  selection  criteria  and  only  perceives  groups, 
effectively  filtering  all  the  details.  The  groups  themselves  have  attributes  that  may 
provoke  an  interest  that  results  in  zooming  in  on  individual  entities.  For  example,  groups 
have  attributes  for  center  of  mass  and  distance  from  the  pilot.  The  nearest  group  often 
presents  the  greatest  threat  to  safety,  so  the  pilot  may  zoom  in  on  individual  members  by 
selecting  that  group  for  attention.  Once  selected,  the  individual  members  of  the  group 
appear  as  perceptual  input  to  the  agent’s  working  memory. 

2.1.5  Top-down  and  Bottom-up  Attention 

The  selection  mechanism  supports  top-down  form  of  attention  -  the  pilot  intentionally 
directs  attention  to  the  particular  entities  or  groups  based  on  the  tactical  situation. 
Bottom-up  attention  also  plays  an  important  role  by  making  the  pilot  responsive  to  threats 
in  the  environment.  Explosions,  muzzle  blasts,  and  weapon  firing  can  all  have  an 
immediate  effect  on  perception.  Explosions  serve  to  heighten  awareness  in  the  pilot,  but 
there  is  entity  associated  with  the  percepts.  Muzzle  blasts,  however,  will  cause  the  source 
entity  to  automatically  appear  in  the  pilot’s  working  memory  through  perception.  This 
bottom-up  mechanism  is  customized  for  the  military  simulation  domain.  In  some  related 
research  on  virtual  humans  we  are  currently  integrating  a  more  general  mechanism  for 
bottom-up  attention  based  on  salience  maps  (Itti  and  Koch,  2000).  Bitmap  images  are 
input  to  the  salience  maps,  which  detect  contrasts  in  color,  intensity,  and  orientation  and 
combine  their  effects  to  predict  where  attention  will  be  shifted.  To  take  advantage  of  this 
bottom-up  mechanism,  however,  it  is  necessary  to  graphically  render  the  simulation  and 
feed  the  image  to  the  salience  maps. 


2.2  Comprehension 

Perception  is  only  the  beginning  of  understanding.  The  perception  module  described 
here  produces  information  about  the  location,  velocity,  orientation  and  some  of  the 
physical  attributes  of  individual  entities  and  at  a  more  abstract  level  about  groups.  But 
when  it  comes  to  understanding  a  group’s  actions  or  its  intentions,  it  is  necessary  to 
recognize  the  spatial  and  organizational  patterns  of  the  group.  A  shortcoming  of  earlier 
versions  of  the  pilot  was  the  inability  to  recognize  the  shape  of  groups.  Groups  were 
perceived  as  clusters  of  entities  in  the  same  proximity  -  a  cluster  has  a  center  of  mass,  a 
bounding  box,  an  entity  count  (or  measure  of  density)  and  an  aggregated  velocity  and 
orientation.  What  was  lacking  in  perception  was  the  ability  to  recognize  and  differentiate 
the  formations  of  entities  -  the  pilot  could  not  tell  the  difference  between  a  column  of 
tanks  and  a  wedge  of  tanks,  a  line  of  tanks  and  a  staggered  column.  A  formation  can  give 
an  indication  of  the  intentions  of  the  force,  whether  it  is  attacking,  defending,  or  simply 
moving  from  one  area  to  another. 

Besides  interpreting  a  unit’s  tactical  formation,  it  is  also  important  to  be  able  to  infer  its 
echelon  and  type.  Discerning  that  a  group  is  really  a  company  of  supply  trucks  may  have 
a  much  different  set  of  implications  that  if  it  is  a  battalion  of  tanks.  Combining 
organizational  information  with  tactical  formation  can  lead  to  a  clearer  tactical 
understanding  of  the  situation.  For  example,  if  the  pilot  sees  a  column  of  thirty  tanks  in  a 
column  on  a  road  this  indicates  that  a  tank  battalion  is  performing  a  road  march  to  some 
destination.  On  the  other  hand,  if  the  same  thirty  takes  were  observed  in  a  line  formation 
advancing  across  an  open  field,  then  they  are  probably  attacking.  Or,  if  they  are  in  a 
staggered  line  (which  may  conform  to  local  terrain  features)  but  not  moving,  their  stance 
may  be  interpreted  as  defensive. 

To  achieve  the  goal  of  tactical  understanding,  we  took  a  template-based  approach  to 
pattern  recognition  (Zhang  and  Hill,  2000a,b).  This  approach  involved  four  steps:  (1) 
constructing  a  template  database,  (2)  building  patterns  of  observed  objects,  (3)  finding  the 
best  possible  situation  template,  and  (4)  refining  the  patterns  based  on  situation 
hypotheses  and  by  collecting  more  data. 

2.2.1  Template  database 

A  template  database  was  constructed  from  the  ModSAF  formation  and  echelon  databases. 
The  ModSAF  formation  database  contains  spatial  layouts  of  units  at  the  platoon  and 
company  level  that  are  used  in  different  tactical  situations.  The  layouts  are  a  convenience 
for  ModSAF  users  who  desire  to  rapidly  instantiate  units  of  vehicles  in  tactical  formation 
-  once  the  vehicle  type  and  echelon  are  selected  the  user  can  specify  a  formation. 
ModSAF’ s  echelon  database  contains  information  about  the  composition  of  different 
types  of  units  at  each  echelon.  For  this  study  we  focused  on  lower  level  echelons. 

To  construct  a  template  database,  we  encoded  tactical  formations  from  the  ModSAF 
database  into  individual  templates.  The  template  representation  is  based  on  the  k-d  tree 
data  structure  (Friedman  et  al,  1977),  which  has  been  widely  used  in  many  areas  such  as 
computational  geometry  and  computer  vision.  This  type  of  structure  is  specially 
designed  to  encode  spatial  information  and  there  are  efficient  algorithms  for  matching  kd- 


trees  to  look  for  correspondences.  The  templates  are  indexed  based  on  the  types  and 
quantities  of  entities  contained  in  them. 

2.2.2  Pattern  building 

The  process  of  recognizing  a  situation  begins  by  building  a  pattern,  which  is  a  structured 
representation  of  a  set  of  sensed  objects.  Patterns  are  built  by  retrieving  candidate 
templates  from  the  database  based  on  the  perceived  objects’  types  and  quantities.  With 
this  set  of  pre-selected  situation  templates,  the  next  step  is  to  build  a  set  of  patterns  of  the 
observed  objects,  one  pattern  per  template.  Building  a  pattern  based  on  a  template 
involves  creating  such  a  pattern  that  is  as  similar  to  the  tree  structure  stored  in  the 
template  as  possible,  and  then  measuring  the  similarity  between  the  pattern  and  the  tree 
of  the  template.  In  building  a  pattern,  the  parameters  of  the  tree  of  a  template,  such  as  the 
distances  of  entities  and  their  spatial  codes,  are  used  as  parameters  to  cluster  the  sensed 
entities  and  organize  them  into  another  tree  structure.  The  algorithm  for  building  a 
pattern  works  in  a  bottom-up  fashion.  It  first  retrieves  the  distance  of  the  lowest-level 
cluster  in  the  tree  of  a  template,  and  uses  the  distance  as  the  proximity  parameter  for 
clustering  the  perceived  entities.  Each  set  of  clustered  entities  is  stored  in  a  subtree,  with 
the  root  representing  the  cluster  and  its  children  the  entities  in  the  cluster.  The  algorithm 
recursively  applies  itself  to  the  clusters,  creating  a  set  of  clusters  of  clusters.  The 
algorithm  continues  until  one  cluster  is  formed.  The  output  of  this  step  is  a  set  of 
candidate  hypotheses  about  the  current  situation. 

2.2.3  Selecting  the  Closest  Match 

To  identify  the  best  hypothesis  the  algorithm  measures  the  similarity  between  each 
pattern  and  its  corresponding  template  and  the  similarities  of  all  patterns  need  to  be 
compared  in  order  to  choose  the  best  one.  In  this  work,  we  developed  a  domain- 
independent  measuring  method.  The  similarity  of  a  pattern  to  its  template  is  measured  as 
follows.  A  mismatch  between  two  nodes,  one  from  the  pattern  and  the  other  from  the 
tree  of  the  template,  will  incur  a  penalty.  The  penalty  is  greater  if  the  nodes  are  higher  in 
their  trees,  since  a  node  higher  in  a  tree  represents  an  organization  in  a  higher 
organizational  hierarchy  and  a  mismatch  between  two  larger  organizations  will  impose  a 
bigger  impact  on  the  overall  matching.  In  our  system,  we  experimentally  adjust  a  set  of 
penalty  weights  for  mismatches  at  different  depths.  In  the  end,  the  algorithm  ranks  the 
situation  hypotheses  based  on  their  measure  of  similarity. 

2.2.4  Hypothesis  Refinement 

The  best  hypothesis  may  not  inspire  a  high  degree  of  confidence  if  the  measure  of 
similarity  is  low.  In  these  cases  it  may  be  necessary  to  collect  more  information,  which  is 
accomplished  by  looking  around.  Where  the  pilot  looks  and  what  information  is  sought 
will  depend  on  the  set  of  hypotheses  that  need  to  be  either  confirmed  or  disproved.  We 
did  not  experiment  with  this  process,  but  it  would  be  the  next  logical  step  to  take  in  this 
line  of  research.  Thus,  one  of  the  motivations  for  visual  search,  which  is  an  important 
perceptual  process  related  to  attention,  is  that  it  is  driven  by  the  need  to  refine  hypotheses 
about  the  current  situation. 


2.3  Prediction 

A  pilot  needs  to  be  able  to  anticipate  where  to  look  when  performing  visually  oriented 
tasks  in  a  dynamic  environment.  This  equates  to  an  ability  to  make  short-term 
predictions  about  the  direction  that  a  mobile  agent  will  travel  as  it  traverses  complex 
terrain..  For  a  number  of  tasks,  the  pilot’s  visual  attention  is  divided  between  tracking  one 
or  more  vehicles  and  scanning  the  environment  for  information.  To  accomplish  this 
involves  shifting  the  pilot’s  gaze  from  the  vehicle(s)  being  tracked  to  other  objects  in  the 
environment.  Reacquiring  these  highly  mobile  vehicles  can  be  difficult  when  one’s 
attention  is  momentarily  diverted  elsewhere.  Looking  away  even  for  a  few  seconds  can 
result  in  losing  track  of  the  vehicle  since  it  can  move  hundreds  of  feet  in  a  short  time.  To 
make  it  easier  to  visually  reacquire  a  moving  vehicle,  we  wanted  to  enable  the  pilot  to 
make  short-term  predictions  of  where  the  vehicle  will  be  located  up  to  seven  seconds  in 
the  future.  With  this  prediction  the  pilot  would  be  able  to  shift  back  his  gaze  to 
approximately  the  right  place  to  reacquire  the  target  object.  But  projecting  the  direction 
and  location  of  a  vehicle  is  not  a  simple  matter — terrain  features  such  as  rivers  and 
mountains,  and  cultural  features  such  as  roads  and  bridges,  can  strongly  influence  the 
path  taken  by  a  driver.  We  do  not  believe  it  is  sufficient  to  predict  a  vehicle’s  location  by 
making  a  simple  linear  projection.  A  driver  may  choose  to  turn  at  a  road  intersection  or 
change  direction  to  avoid  a  natural  obstacle  such  as  a  lake  or  a  steep  mountain.  Moving  at 
48  kilometers/hour,  a  vehicle  can  cover  100  meters  in  the  short  time  the  pilot  glances 
away,  or  worse,  the  vehicle  may  change  direction  and  end  up  someplace  unexpected.  In 
either  case,  the  observer  needs  to  reacquire  the  visual  target  with  a  minimal  amount  of 
search. 

We  hypothesized  that  the  environment  provides  visual  cues  that  can  be  used  for  making 
short-term  predictions  about  a  mobile  agent’s  location  without  taking  into  account  its 
goals  or  intentions.  While  knowledge  of  an  agent’s  intentions  may  also  be  useful  in 
making  such  predictions,  that  approach  was  not  the  focus  of  this  study.  Instead,  we 
implemented  a  neural  network  that  takes  as  input  a  set  of  terrain  features  in  the  vicinity  of 
a  mobile  agent,  and  with  this  information  it  generates  a  probability  vector  that  predicts 
the  likelihood  that  the  agent  will  travel  in  each  of  fifteen  different  directions.  This  is 
transformed  into  a  prediction  about  the  agent’s  future  location,  along  with  a  time  period 
that  it  is  valid.  We  integrated  the  prediction  capability  with  the  perceptual  system  of  our 
Soar-based  helicopter  pilot. 

For  this  study  we  investigated  the  influence  of  seven  terrain  features:  mountains,  hard 
roads,  soft  roads,  passable  water,  impassable  water,  buildings  and  forests.  All  of  these 
features  were  available  in  a  ModSAF  terrain  database — ^by  querying  a  specific  location  in 
the  terrain  one  can  find  out  whether  one  or  more  of  these  features  were  present.  Because 
of  this  way  of  storing  information,  we  had  to  take  samples  of  the  areas  of  interest  rather 
than  doing  an  exhaustive  query  of  the  space. 

We  tested  the  system  on  a  number  of  scenarios,  including  one  where  a  tank  follows  a 
road  through  a  mountain  area  with  forests.  The  algorithm  accurately  predicts  that  the  tank 
will  follow  the  road.  When  we  made  a  comparison  of  the  actual  and  predicted  locations 
over  time — ^the  points  showed  very  little  deviation  and  corresponded  to  the  road  shape. 


This  means  that  the  road  terrain  feature  affected  the  tank’s  movement  and  the  pilot  was 
able  to  accurately  predict  this  movement. 

Another  general  class  of  scenarios  involves  situations  where  the  tank  is  travelling  cross¬ 
country  (i.e.,  it  is  not  following  a  road).  In  one  of  these  scenarios  the  tank  came  close  to 
a  river  and  had  to  find  a  bridge  in  order  to  cross  it.  As  the  tank  drove  within  300  meters 
of  the  river  (blue  box)  the  predictability  decreased  significantly.  This  was  because  the 
terrain  database  did  not  provide  bridge  information,  so  algorithm  recognized  that  the  river 
would  moderate  the  tank’s  behavior,  but  it  could  not  predict  how.  It  eventually  predicted 
that  the  tank  would  cross  the  river  at  a  point  where  it  was  narrow.  With  the  shorter 
prediction  time  it  became  necessary  for  the  pilot  to  track  the  mobile  agent  more  closely 
until  it  is  in  a  place  where  predictions  can  be  made  more  confidently. 


Speed  of  the  target  object 

Error 

10  km/h 

±  4  meters 

20  km/h 

±  8.5  meters 

30  km/h 

+  10  meters 

48  km/h 

±15  meters 

Table  1.  Target  Speed 

versus  Error 

We  ran  the  algorithm  on  mobile  agents  moving  at  different  speeds  to  test  the  accuracy  of 
the  predictions.  The  mobile  agent’s  speed  is  assumed  to  be  constant  for  any  given 
prediction.  Table  1  shows  the  maximum  error  between  the  predicted  and  actual  positions. 
Given  that  the  distance  between  the  virtual  pilot  and  the  mobile  agent  ranged  from  1  to  4 
km,  the  pilot  should  have  very  little  difficulty  reacquiring  their  targets.^ 


3.  Scientific,  military,  and  commercial  impact  of  accomplishments 

Virtual  humans,  like  real  humans,  need  to  perceive  their  environment.  They  need  to  be 
capable  of  processing  perceptual  information,  focusing  their  visual  attention, 
comprehending  situations  and  the  behaviors  of  others,  and  predicting  where  to  direct  their 
attention  when  tracking  and  reacquiring  multiple  objects.  Humans  develop  a  knack  for 
directing  their  gaze  toward  the  right  objects  at  the  right  time — virtual  humans  need  the 
same  kind  of  ability.  But  for  an  increasing  number  of  applications,  a  virtual  human’s 
gaze  behaviors  must  go  beyond  serving  situational  perceptual  needs — they  must  also  be 
realistic-looking  to  human  observers.  This  is  particularly  tme  in  social  situations,  where 
the  direction  of  one’s  gaze  plays  a  significant  role  in  the  communication  (Cassell  and 
Vilhjalmsson,  1999;  Argyle  and  Cook,  1976).  This  area  of  research  will  be  fertile  ground 
for  developing  and  testing  scientific  hypotheses  about  human  perceptual  attention  in 
virtual  humans. 


^  We  have  only  recently  begun  to  test  this  hypothesis  and  this  will  be  the  subject  of  future  work. 


The  impact  on  military  applications  is  clear — there  is  an  increasing  interest  in  modeling 
human  behavior  and  in  the  creation  of  virtual  humans  for  mission  rehearsal.  We  have 
addressed  the  issue  of  how  to  control  a  virtual  human’s  perceptual  focus  using  an 
intrinsic  model  of  perceptual  attention  and  cognition  so  that  the  resulting  behavior  is  both 
realistic  and  believable.  We  also  addressed  how  perception  can  support  comprehension 
by  mapping  visual  patterns  onto  doctrinal  templates  of  unit  tactical  formations,  and 
echelons  for  military  domains. 

In  addition  to  military  applications,  this  research  will  also  potentially  impact  the 
development  of  virtual  humans  in  commercial  applications  such  as  computer  games, 
interactive  narrative,  and  even  motion  pictures  that  use  animated  characters.  In  order  to 
create  a  sense  of  immersion,  computer  games  and  interactive  narrative  both  require 
realistic  and  believable  behavior.  There  is  a  tension  between  what  a  model  of  perceptual 
attention  can  generate  and  whether  it  is  believable  when  observed  by  a  human.  At  one 
extreme,  animated  characters  can  portray  a  believable  human  character  and  yet  have  no 
autonomy  or  cognition  at  all.  An  example  of  believable  characters  with  no  cognition  can 
be  seen  in  the  Final  Fantasy  movie — ^the  behavior  of  these  charaeters  employ  believable 
emotional  expressions  and  gestures,  but  it  is  all  driven  by  a  human  animator,  who  decides 
every  nuance  of  the  performance.  The  problem  with  this  approach,  however,  is  that 
characters  cannot  dynamically  interact  with  the  environment  or  with  other  agents.  At  the 
other  extreme,  a  virtual  human  could  potentially  be  developed  with  a  very  realistic  model 
of  perceptual  attention,  but  fail  to  create  a  believable  human  eharacter,  since  gaze 
behaviors  are  purely  functional — they  may  be  socially  or  emotionally  moderated. 

This  work  attempts  to  address  the  issue  of  believability  and  realism:  (1)  it  is  grounded  in 
psychological  theories  of  perception  and  cognition,  especially  concerning  the  control  of 
attention  and  perceptual  grouping.  (2)  It  is  more  believable  than  current  models  from  the 
standpoint  of  providing  rich  representations  of  the  world  while  remaining 
computationally  tractable. 


4.  Technology  Transfer 

Portions  of  this  research  have  already  been  transferred  to  the  research  group  at  the 
University  of  Southern  California’s  (USC)  Institute  for  Creative  Technologies  (ICT). 

The  ICT  is  building  a  virtual  human  for  use  in  the  Mission  Rehearsal  Exercise  (MRE) 
Project.  The  perceptual  system  of  this  virtual  human  is  being  modeled  after  the  work 
performed  under  this  grant.  Furthermore  the  work  on  situation  awareness  is  currently 
being  extended  in  two  areas.  First,  we  are  investigating  how  to  use  saliency  maps  (Itti 
and  Koch,  2000)  as  the  underlying  mechanism  for  bottom  up  attention  in  the  agent’s 
perceptual  system.  Itti’s  saliency  maps  detect  differences  in  luminance,  color,  and 
orientation  and  will  soon  also  detect  motion.  We  have  already  integrated  Itti’s  saliency 
map  code  into  the  virtual  human  and  used  it  to  analyze  scenes  from  the  Mission 
Rehearsal  Exercise.  The  next  research  step  will  be  to  investigate  how  to  integrate  top- 
down,  task-oriented  attention  with  bottom-up  attention.  The  second  area  where  we  are 
extending  situation  awareness  is  by  enabling  the  virtual  humans  to  build  a  cognitive  map 


of  the  space  they  have  explored.  The  cognitive  map  is  based  on  what  is  perceivable 
rather  than  being  built  from  ground  truth.  This  differs  from  many  military  simulations 
where  the  virtual  humans  plan  from  a  map  but  cannot  perceive  anything  in  the 
environment  except  other  entities.  We  have  adapted  an  algorithm  developed  by  Yeap  and 
Jeffries  (1999)  for  computing  local  representations  of  the  environment  based  on  the 
perceptual  perspective. 


5.  ONR  Database  Statistics 

Publications  (See  Appendix  A  for  a  detailed  list  publications  and  presentations) 

0  -  Number  of  Papers  Published  in  Refereed  Journals  Supported  by  ONR 
0  -  Number  of  Books  or  Chapters  Published  Supported  by  ONR 

7  -  Number  of  Refereed  Conference  Papers  Supported  by  ONR 

1  -  Number  of  Technical  Reports  &  Non-Refereed  Papers  Supported  by  ONR 
0  -  Number  of  Patents  Issued 
0  -  Number  of  Patents  Pending 

8  -  Number  of  Presentations 

1  -  Number  of  Degrees  Granted 


PI/CoPI  Information 

Women  Men 

Minority  0  0 

Non-Minority  0  1 

Total  0  1 

Grad  Students  Information 

Women  Men 

Minority  0  0 

Non-Minority  0  1 

Total  0  1 

Post  Doctoral  Information 

Women  Men 

Minority  0  0 

Non-Minority  0  0 

Total  0  0 


Awards 

Dr.  Randall  W.  Hill,  Jr.,  University  of  Southern  California  Information  Sciences  Institute 
Best  Paper  Award  for  "Modeling  Perceptual  Attention  in  Virtual  Humans."  Eight 
Conference  on  Computer  Generated  Forces  and  Behavioral  Representation,  Orlando,  FL, 
May  1999. 
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